Web Document Segmentation for Better Extraction of Information: A Review
نویسندگان
چکیده
منابع مشابه
Information Extraction from Document Images using Attention Based Layout Segmentation
Introduction The attention of a human reader and the reading speed strongly depends on the layout of a document The term layout is used for the geometrical arrangement of document components (i.e. text, graphics and figures) on the page as well as for the typographic features of the text (i.e. font type, style, size, alignment and line spacing). Although the human visual and cognitive perceptio...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملVision-Based Deep Web Data Extraction for Web Document Clustering
The design of web information extraction systems becomes more complex and time-consuming. Detection of data region is a significant problem for information extraction from the web page. In this paper, an approach to vision-based deep web data extraction is proposed for web document clustering. The proposed approach comprises of two phases: 1) Vision-based web data extraction, and 2) web documen...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملMulti-document Summarization for Terrorism Information Extraction
Counterterrorism is one of the major challenges to the society. In order to flight again the terrorists, it is very important to have a through understanding of the terrorism incidents. However, it is impossible for a human to read all the information related to a terrorism incident because of the large volume of information. Summarization technique is urgently required for analysis of terroris...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2015
ISSN: 0975-8887
DOI: 10.5120/19297-0734